The Random Forest Kerneland creating other kernels for big data from random partitions

نویسندگان

  • Alex Davies
  • Zoubin Ghahramani
چکیده

We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing O(N) inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation of Random Forest Algorithm in Order to Use Big Data to Improve Real-Time Traffic Monitoring and Safety

Nowadays the active traffic management is enabled for better performance due to the nature of the real-time large data in transportation system. With the advancement of large data, monitoring and improving the traffic safety transformed into necessity in the form of actively and appropriately. Per-formance efficiency and traffic safety are considered as an im-portant element in measuring the pe...

متن کامل

Efficient Learning of Random Forest Classifier using Disjoint Partitioning Approach

Random Forest is an Ensemble Supervised Machine Learning technique. Research work in the area of Random Forest aims at either improving accuracy or improving performance. In this paper we are presenting our research towards improvement in learning time of Random Forest by proposing a new approach called Disjoint Partitioning. In this approach, we are using disjoint partitions of training datase...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

Random forest algorithm in big data environment

Random forest method is one of the most widely applied classification algorithms at present. From the actual big data scene and requirements, the application of random forest method in the big data environment to conduct in-depth study. Due to the big data needs to process a huge number of features at the same time, and the data pattern changes constantly over time, the accuracy of a random for...

متن کامل

Coinciding Walk Kernels: Parallel Absorbing Random Walks for Learning with Graphs and Few Labels

Exploiting autocorrelation for node-label prediction in networked data has led to great success. However, when dealing with sparsely labeled networks, common in present-day tasks, the autocorrelation assumption is difficult to exploit. Taking a step beyond, we propose the coinciding walk kernel (cwk), a novel kernel leveraging label-structure similarity – the idea that nodes with similarly arra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014